This vignette is described as a sample output using
hubVis package, one of the applications of Hubverse. By
using it the model projections(with quantile data) will be plotted.
For more information about the Hubverse standard format, please refer to the HubDocs website.
library(hubVis)
library(hubUtils)
Plots are generated by applying:
The two datasets will be used as an example:
example_round1.csv: example of model output for a
round associated with the origin date: “2021-03-07” (called “round 1”),
target: “incident case”, for the US national level, from the example-complex-scenario-hub.
The data set also contains an ensemble calculated by applying the
function:
hubEnsembles::simple_ensemble(df_round1, agg_fun = "median")
rsvnet_hospitalization.csv: example of target data
which comes from the rsv-scenario-modeling-ub repo
projection_path <- "./sample/teamsam-modelple/2023-11-19-teamsam-modelple.parquet"
projection_data <- arrow::read_parquet(projection_path)
head(projection_data)
#> # A tibble: 6 × 8
#> origin_date target horizon location age_group output_type output_type_id value
#> <date> <chr> <int> <chr> <chr> <chr> <dbl> <dbl>
#> 1 2023-11-19 inc h… 1 US 0-0.99 quantile 0.01 3.83
#> 2 2023-11-19 inc h… 1 US 0-0.99 quantile 0.025 5.73
#> 3 2023-11-19 inc h… 1 US 0-0.99 quantile 0.05 8.90
#> 4 2023-11-19 inc h… 1 US 0-0.99 quantile 0.1 15.2
#> 5 2023-11-19 inc h… 1 US 0-0.99 quantile 0.15 21.6
#> 6 2023-11-19 inc h… 1 US 0-0.99 quantile 0.2 27.9
truth_path <- "../target-data/rsvnet_hospitalization.csv"
truth_data <- read.csv(truth_path, stringsAsFactors = FALSE)
head(truth_data)
#> location date age_group target value population
#> 1 47 2016-10-08 18-130 rate hosp 0.200000 5126526
#> 2 47 2016-10-08 18-130 inc hosp 10.253052 5126526
#> 3 41 2016-10-08 65-130 rate hosp 0.400000 681767
#> 4 41 2016-10-08 65-130 inc hosp 2.727068 681767
#> 5 27 2016-10-08 18-49 rate hosp 0.000000 2280031
#> 6 27 2016-10-08 18-49 inc hosp 0.000000 2280031
The model output data in the projection_data object
follows the structure of the
model_out_tbl class. This dataset is converted to a
model_out_tbl object after being read-in above. In addition
to the standard requirements for this class, the
plot_step_ahead_model_output() function in
hubVis requires that the dataset have a column whose value
corresponds to the variable that should be used for the x-axis of a
“step ahead” plot. In general, this should be a date variable that
corresponds to the date which is the “target” of a particular
prediction. By default it will look for the "target_date"
column, although this could be over-ridden by specifying a different
column using the x_col_name argument. In our example data,
this column does not exist, so we add it below:
projection_data_a <- dplyr::filter(projection_data, target=="inc hosp",
age_group == "0-130",
)
projection_data_ab <- dplyr::mutate(
projection_data_a, target_date = as.Date(origin_date) + (horizon * 7) - 1,
model_id="teamsam-modelple1")
projection_data_ab <- as_model_out_tbl(projection_data_ab)
head(projection_data_ab)
#> # A tibble: 6 × 10
#> model_id origin_date target horizon location age_group target_date output_type
#> <chr> <date> <chr> <int> <chr> <chr> <date> <chr>
#> 1 teamsam… 2023-11-19 inc h… 1 US 0-130 2023-11-25 quantile
#> 2 teamsam… 2023-11-19 inc h… 1 US 0-130 2023-11-25 quantile
#> 3 teamsam… 2023-11-19 inc h… 1 US 0-130 2023-11-25 quantile
#> 4 teamsam… 2023-11-19 inc h… 1 US 0-130 2023-11-25 quantile
#> 5 teamsam… 2023-11-19 inc h… 1 US 0-130 2023-11-25 quantile
#> 6 teamsam… 2023-11-19 inc h… 1 US 0-130 2023-11-25 quantile
#> # ℹ 2 more variables: output_type_id <dbl>, value <dbl>
truth_data <- dplyr::filter(truth_data, target=="inc hosp", age_group=="0-130")
truth_data <- dplyr::mutate(truth_data, time_idx=date)
head(truth_data)
#> location date age_group target value population time_idx
#> 1 08 2018-10-06 0-130 inc hosp 0.000000 5661221 2018-10-06
#> 2 47 2018-10-06 0-130 inc hosp 6.757828 6757828 2018-10-06
#> 3 49 2018-10-06 0-130 inc hosp 3.150318 3150318 2018-10-06
#> 4 36 2018-10-06 0-130 inc hosp 0.000000 19519158 2018-10-06
#> 5 35 2018-10-06 0-130 inc hosp 0.000000 2082103 2018-10-06
#> 6 27 2018-10-06 0-130 inc hosp 5.606626 5606626 2018-10-06
The plotting function requires only 2 parameters:
model_output_data: a
model_out_tbl object containing all the Hubverse
standard columns, including "target_date" and
"model_id" columns. As all model_output in
model_output_data will be plotted, any filtering needs to happen outside
this function.
truth_data: a data.frame object
containing the ground truth data, including the columns:
"time_idx" and "value".
The projection_data and truth_data contain
information for multiple locations, and scenarios.
To plot the model projections for the US, No Scenario id :
# Pre-filtering
projection_data_A_us <- dplyr::filter(projection_data_ab,
location == "US")
# Limit time_idx for layout reason
truth_data_us <- dplyr::filter(truth_data, location == "US",
time_idx < min(projection_data_ab$target_date),
time_idx > "2023-06-01")
plot_step_ahead_model_output(projection_data_A_us, truth_data_us)
truth_data <- dplyr::filter(truth_data,
time_idx < min(projection_data_ab$target_date) ,
time_idx > "2023-06-01")
plot_step_ahead_model_output(projection_data_ab, truth_data,
use_median_as_point = TRUE,
facet = "location", facet_scales = "free_x",
facet_nrow = 4, facet_title = "top left", show_legend = FALSE)
Multiple layout update are possible:
plot_step_ahead_model_output(projection_data_A_us, truth_data_us,
plot_truth = FALSE)
Change palette color and behavior:
RColorBrewer::display.brewer.all()
plot_step_ahead_model_output(projection_data_A_us, truth_data_us,
pal_color = "Dark2")
It is possible to use only blues for all models, by setting the
pal_color parameter to NULL. This might be
especially useful when used for many models in conjunction with
highlighting the ensemble forecast using the ens_name and
ens_color argument.
plot_step_ahead_model_output(projection_data_A_us, truth_data_us,
intervals = 0.8,
ens_name = "hub-ensemble", ens_color = "black",
pal_color = NULL, use_median_as_point = TRUE)
The default blue color can be changed with the one_color
parameter
plot_step_ahead_model_output(projection_data_A_us, truth_data_us,
intervals = 0.8, one_color = "orange",
ens_name = "hub-ensemble", ens_color = "black",
pal_color = NULL, use_median_as_point = TRUE)
plot_step_ahead_model_output(projection_data_A_us, truth_data_us,
interactive = FALSE)
The input data frames can have different column names for the date
information. In this case, the two x_col_name and
x_truth_col_name parameters can be used to indicate the
variables that should be mapped to the x-axis.
names(truth_data_us)[names(truth_data_us) == "time_idx"] <- "time"
names(projection_data_A_us)[names(
projection_data_A_us) == "target_date"] <- "date"
plot_step_ahead_model_output(projection_data_A_us, truth_data_us,
x_col_name = "date", x_truth_col_name = "time")